NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

MaSk-LMM: A Matrix Sketching Framework for Linear Mixed Models in Association Studies

Burch, Myson; Bose, Aritra; Dexter, Gregory; Parida, Laxmi; Drineas, Petros (November 2024, International Conference on Research in Computational Molecular Biology (RECOMB))

Full Text Available
Matrix sketching framework for linear mixed models in association studies

https://doi.org/10.1101/gr.279230.124

Burch, Myson; Bose, Aritra; Dexter, Gregory; Parida, Laxmi; Drineas, Petros (September 2024, Genome Research)

Linear mixed models (LMMs) have been widely used in genome-wide association studies to control for population stratification and cryptic relatedness. However, estimating LMM parameters is computationally expensive, necessitating large-scale matrix operations to build the genetic relationship matrix (GRM). Over the past 25 years, Randomized Linear Algebra has provided alternative approaches to such matrix operations by leveragingmatrix sketching, which often results in provably accurate fast and efficient approximations. We leverage matrix sketching to develop a fast and efficient LMM method calledMatrix-Sketching LMM (MaSk-LMM) by sketching the genotype matrix to reduce its dimensions and speed up computations. Our framework comes with both theoretical guarantees and a strong empirical performance compared to the current state-of-the-art for simulated traits and complex diseases.
more » « less
Full Text Available
Structure-informed clustering for population stratification in association studies

https://doi.org/10.1186/s12859-023-05511-w

Bose, Aritra; Burch, Myson; Chowdhury, Agniva; Paschou, Peristera; Drineas, Petros (October 2023, BMC Bioinformatics)

Abstract BackgroundIdentifying variants associated with complex traits is a challenging task in genetic association studies due to linkage disequilibrium (LD) between genetic variants and population stratification, unrelated to the disease risk. Existing methods of population structure correction use principal component analysis or linear mixed models with a random effect when modeling associations between a trait of interest and genetic markers. However, due to stringent significance thresholds and latent interactions between the markers, these methods often fail to detect genuinely associated variants. ResultsTo overcome this, we propose CluStrat, which corrects for complex arbitrarily structured populations while leveraging the linkage disequilibrium induced distances between genetic markers. It performs an agglomerative hierarchical clustering using the Mahalanobis distance covariance matrix of the markers. In simulation studies, we show that our method outperforms existing methods in detecting true causal variants. Applying CluStrat on WTCCC2 and UK Biobank cohorts, we found biologically relevant associations in Schizophrenia and Myocardial Infarction. CluStrat was also able to correct for population structure in polygenic adaptation of height in Europeans. ConclusionsCluStrat highlights the advantages of biologically relevant distance metrics, such as the Mahalanobis distance, which captures the cryptic interactions within populations in the presence of LD better than the Euclidean distance.
more » « less
Can polygenic risk scores help explain disease prevalence differences around the world? A worldwide investigation

https://doi.org/10.1186/s12863-023-01168-9

Jain, Pritesh R.; Burch, Myson; Martinez, Melanie; Mir, Pablo; Fichna, Jakub P.; Zekanowski, Cezary; Rizzo, Renata; Tümer, Zeynep; Barta, Csaba; Yannaki, Evangelia; et al (November 2023, BMC Genomic Data)

Abstract Complex disorders are caused by a combination of genetic, environmental and lifestyle factors, and their prevalence can vary greatly across different populations. The extent to which genetic risk, as identified by Genome Wide Association Study (GWAS), correlates to disease prevalence in different populations has not been investigated systematically. Here, we studied 14 different complex disorders and explored whether polygenic risk scores (PRS) based on current GWAS correlate to disease prevalence within Europe and around the world. A clear variation in GWAS-based genetic risk was observed based on ancestry and we identified populations that have a higher genetic liability for developing certain disorders. We found that for four out of the 14 studied disorders, PRS significantly correlates to disease prevalence within Europe. We also found significant correlations between worldwide disease prevalence and PRS for eight of the studied disorders with Multiple Sclerosis genetic risk having the highest correlation to disease prevalence. Based on current GWAS results, the across population differences in genetic risk for certain disorders can potentially be used to understand differences in disease prevalence and identify populations with the highest genetic liability. The study highlights both the limitations of PRS based on current GWAS but also the fact that in some cases, PRS may already have high predictive power. This could be due to the genetic architecture of specific disorders or increased GWAS power in some cases.
more » « less
CluStrat: A Structure Informed Clustering Strategy for Population Stratification

https://doi.org/10.1007/978-3-030-45257-5_19

Bose, Aritra; Burch, Myson; Chowdhury, Agniva; Paschou, Peristera; Drineas, Petros (January 2020, Research in Computational Molecular Biology)

Full Text Available

Search for: All records